Using Twitter to Measure Public Discussion of Diseases: A Case Study.
نویسندگان
چکیده
BACKGROUND Twitter is increasingly used to estimate disease prevalence, but such measurements can be biased, due to both biased sampling and inherent ambiguity of natural language. OBJECTIVE We characterized the extent of these biases and how they vary with disease. METHODS We correlated self-reported prevalence rates for 22 diseases from Experian's Simmons National Consumer Study (n=12,305) with the number of times these diseases were mentioned on Twitter during the same period (2012). We also identified and corrected for two types of bias present in Twitter data: (1) demographic variance between US Twitter users and the general US population; and (2) natural language ambiguity, which creates the possibility that mention of a disease name may not actually refer to the disease (eg, "heart attack" on Twitter often does not refer to myocardial infarction). We measured the correlation between disease prevalence and Twitter disease mentions both with and without bias correction. This allowed us to quantify each disease's overrepresentation or underrepresentation on Twitter, relative to its prevalence. RESULTS Our sample included 80,680,449 tweets. Adjusting disease prevalence to correct for Twitter demographics more than doubles the correlation between Twitter disease mentions and disease prevalence in the general population (from .113 to .258, P <.001). In addition, diseases varied widely in how often mentions of their names on Twitter actually referred to the diseases, from 14.89% (3827/25,704) of instances (for stroke) to 99.92% (5044/5048) of instances (for arthritis). Applying ambiguity correction to our Twitter corpus achieves a correlation between disease mentions and prevalence of .208 ( P <.001). Simultaneously applying correction for both demographics and ambiguity more than triples the baseline correlation to .366 ( P <.001). Compared with prevalence rates, cancer appeared most overrepresented in Twitter, whereas high cholesterol appeared most underrepresented. CONCLUSIONS Twitter is a potentially useful tool to measure public interest in and concerns about different diseases, but when comparing diseases, improvements can be made by adjusting for population demographics and word ambiguity.
منابع مشابه
A High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملExamination of Emergency Medicine Physicians’ and Residents’ Twitter Activities During the First Days of the COVID-19 Outbreak
Introduction: Social media has become an important element of interaction and found itself a place in every aspect of our lives. This study examined the twitter activities of emergency medicine physicians and residents (EMP&R;) about the COVID-19 outbreak. Methods: The study concentrated on Twitter, a major social media network. To identify accounts owned ...
متن کاملIdentification of Genetic Polymorphism Interactions in Sporadic Alzheimer’s Disease Using Logic Regression
Objectives: Genetic polymorphism interactions are among the important factors in affliction with complex diseases like Alzheimer’s disease. The important goal of genetic association studies is to identify a combination of polymorphisms and measure their importance in increasing the risk of occurrence of such diseases. In this study, feature selection approach of logic regression was used to ide...
متن کاملEthical issues in using Twitter for population-level depression monitoring: a qualitative study.
BACKGROUND Recently, significant research effort has focused on using Twitter (and other social media) to investigate mental health at the population-level. While there has been influential work in developing ethical guidelines for Internet discussion forum-based research in public health, there is currently limited work focused on addressing ethical problems in Twitter-based public health rese...
متن کاملA Bibliometric and Altmetrics Analysis of Highly Cited Articles in the Field of Infectious Diseases
Background and Aim: Infectious Diseases are among the diseases involved in public health and a high percentage of causes of death worldwide are attributed to these diseases. The purpose of this study was to investigate the status of highly cited articles in the field of infectious diseases based on bibliometrics and Altmetrics indicators. Materials and Methods: This descriptive-analytical rese...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JMIR public health and surveillance
دوره 1 1 شماره
صفحات -
تاریخ انتشار 2015